3 research outputs found

    Multi-Fidelity Multi-Armed Bandits Revisited

    Full text link
    We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy. We study both the best arm identification with fixed confidence (BAI) and the regret minimization objectives. For BAI, we present (a) a cost complexity lower bound, (b) an algorithmic framework with two alternative fidelity selection procedures, and (c) both procedures' cost complexity upper bounds. From both cost complexity bounds of MF-MAB, one can recover the standard sample complexity bounds of the classic (single-fidelity) MAB. For regret minimization of MF-MAB, we propose a new regret definition, prove its problem-independent regret lower bound Ω(K1/3Λ2/3)\Omega(K^{1/3}\Lambda^{2/3}) and problem-dependent lower bound Ω(KlogΛ)\Omega(K\log \Lambda), where KK is the number of arms and Λ\Lambda is the decision budget in terms of cost, and devise an elimination-based algorithm whose worst-cost regret upper bound matches its corresponding lower bound up to some logarithmic terms and, whose problem-dependent bound matches its corresponding lower bound in terms of Λ\Lambda

    Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

    Full text link
    Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs

    On-Demand Communication for Asynchronous Multi-Agent Bandits

    Full text link
    This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.Comment: Accepted by AISTATS 202
    corecore